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Abstract 

We observe a N x M matrix — Sij + ^ij with ^ JV{0,1) i.i.d. in 
and Sij e M. We test the null hypothesis Sij = for all i,j against the alter- 
native that there exists some submatrix of size n x m with significant elements in 
the sense that Sij > a > 0. We propose test procedures and compute asymp- 
totical detection boundary a in order to have maximal test errors tending to as 
M — >■ oo, N — > oo, p = n/N — > 0, q = m/M — > 0. We prove that this boundary 
is asymptotically sharp minimax under some additional constraints. Relations with 
other testing problems are discussed. We propose a testing procedure which adapts 
to (n,m), which is unknown but belongs to some given set, and compute the adap- 
tive sharp rates. The implementation of our test procedure on synthetic data shows 
excellent behavior for sparse, not necessarily squared matrices. We extend our sharp 
minimax results in different directions: first, to Gaussian matrices with unknown 
variance, second, to matrices of random variables having a distribution from an expo- 
nential family (non Gaussian) and, third, to a two-sided alternative for matrices with 
Gaussian elements. 

Keywords: detection of sparse signal, minimax testing, minimax adaptive testing, ran- 
dom matrices, sharp detection bounds. 
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1 Introduction 

In this paper, we observe a high-dimensional random matrix and we want to test the oc- 
currence of a particular submatrix of much smaller size, which has elements with expected 
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values larger than some threshold. We assume that the entries of the matrix are indepen- 
dent, identically distributed (i.i.d.) random variables but some underlying phenomenon 
can increase significantly the expected value of the random variables in the submatrix. 
We have the observations that form an x M matrix Y = {lij}j=i,...,Arj=i,...,Af: 

Yij = Sij + a^ij, i = l,...,N, j = l,...,M, (1.1) 

where a > 0, {Cij} are i.i.d. random variables and Sij G M, for all i G {1,...,A^}, j G 
{1, M}. In the first part of the paper, the errors are assumed to have Gaussian law 
with known variance cr^. Without loss of generality we take o" = 1 in this case. At the end 
of the paper, we extend our results in different directions, as discussed later on. We test 
the null hypothesis that all elements of the matrix Y are i.i.d., standard Gaussian random 
variables AA(0, 1) , that is 

Ho: Sij = V i = l,...,N, j = l,...,M. (1.2) 

The alternative under consideration will correspond to n x m-submatrices of sizes 
n G {1, A^}, m G {1, M} with large enough entries. Let 

Ac{l,...,iV}, #{A) = n, BC{1,...,M}, #{B) = m, C = AxB, (1.3) 

and let Cnm be the collection of all subsets C of the form (|1.3|) . The set Cnm corresponds 
to the collection of all n x m submatrices in x M matrix. For a > 0, which may depend 
on n, m, N and M. We consider the alternative 

Hi : 3 C G Cnm such that Sij = 0, if ^ C, and Sij > a, if (i,j) G C (1-4) 

(in the Remark 12.11 below we discuss that a slightly larger alternative can be considered). 
The components of the matrix Y are independent under the alternative as well. Denote by 
Ps the probability measure that corresponds to observations (jl.ip with matrix S = {sij} 
and by Es the expected value with respect to the measure Ps- 

Let Snm,a be the collection of all matrices S = Sc that satisfy p.4p . 

We discuss here only right-hand side alternatives, but, obviously, left-hand side alter- 
natives can be treated the same way for variables —Yij instead of Yij. 

We extend our results to three different setups and sketch the proofs of the results. 
First, we consider errors having Gaussian distribution with unknown variance cr^. Then, 
the errors are assumed to have a distribution which belongs to the exponential family (not 
necessarily Gaussian). Finally, in the initial case of Gaussian errors with known variance, 
we consider a two-sided alternative of our test problem. 

We are interested here in sparse matrices, i.e. the case when n is much smaller than 
and m is much smaller than M. 
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Sparsity assumptions were introduced for vectors. Estimation as well as hypothesis 
testing for vectors were thoroughly studied in the literature, see for example Bickel, Ritov 
and Tsybakov [4J and references therein and Donoho and Jin [5]. 

In the context of matrices, different sparsity assumptions can be imagined. For exam- 
ple, matrix completion for low rank matrices with the nuclear norm penalization has been 
studied by Koltchinskii, Lounici and Tsybakov [8]. Other results will be discussed later 
on. 

We study the hypothesis testing problem under a minimax setting. A test is any 
measurable function of the observations, ip = taking values in [0,1]. For such 

a test ip = we denote the first-type error, the second-type error under simple 

alternative and the maximal second-type error over the set Snm,a by 

a(V')=£^0V', /3(V',5) =^5(1-^), /3nm,a(V')= SUp 

s&s njn.a 

respectively. Let the global testing error be the sum of the first and second-type errors: 

'y{'lp,S) = a{^) + /3{lp,S), 7nm,a(V') = sup 7(V',5') = q(V') +/3nm,a(V')- 

nm,a 

We define the minimax second- type error at fixed level a E (0, 1) as 

Pnm,a,a — i^f f^nm,a('^')- 
^■.a{ip)<a 

Similarly, let the minimax global testing risk be 

lnm,a = inf Jnm,aW- 

From now on, we assume in the asymptotics that — )• oo, M — )■ oo and n = n^M 

00, in = niNM oo. Other assumptions will be given later. 

We suppose that a > is unknown. The aim of this paper is to give asymptotically 
sharp boundaries for minimax testing. That means, we are first interested in the conditions 
on a = (iNM which guarantee distinguishability, i.e., the fact that ^nm,a 

and Pnm, 

for any a G (0, 1). We construct a testing procedure based on a linear statistic combined 
with a scan statistic. We prove the upper bounds of the minimax testing risk of this 
procedure. Second, we describe conditions on a for which we have indistinguishability, 

1. e., the convergence 7nm,a — ^ 1 and finm,a,a — )• 1 — a for any a S (0, 1). These results are 
called the lower bounds. The two sets of conditions are complementary and match in rate 
and constant. 

Often the sizes n, m of submatrix are unknown, but we know a set ICnm of couples of 
indices (n, m) £ {1, . . . , N} x {1, . . . , M} containing the true one. Then we consider the 
"adaptive" problem for the combined alternative S]\fM,a = U(n m)eA:jvj\/ '^'^ich 
corresponds to a collection a = {a„m, {n,m) E K-nm}- The quantities /3NM,a,a, lNM,a 
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are defined in a similar way as above. We define a testing procedure and check that, 
if flnm satisfies the conditions for distinguishability uniformly over the collection a, the 
upper bounds still hold. The adaptive lower bounds hold as an easy consequence of the 
minimax lower bounds. 

The problem of choosing a submatrix in a Gaussian random matrix has been previously 
studied by Sun and Nobel [9] . They were interested in maximal size submatrices of a matrix 
with increasing size in two setups. First, they consider the case when the average of the 
entries of the submatrix is larger than a given threshold and, second, when the entries 
are well- fitted by a two-way ANOVA matrix in the least-squares sense (i.e., the sum of 
squares of residuals is smaller than some given threshold). 

The algorithm of choosing such submatrices was previously introduced in Shabalin et 
al. [To], who were also interested in finding large average submatrices. This problem is 
strongly motivated by the research of gene expression in microarray data. In these large 
matrices it is necessary to recover biclusters, that is associations between sets of samples 
(rows) and sets of variables (columns). These associations together with clinical and 
biological information are " a first step in identifying disease subtypes and gene regulatory 
networks". Many other algorithms for biclustering are discussed and compared on real- 
data bases concerning breast and lung cancer studies. 

Similar problems were considered in Addario-Berry et al. [Ij. They use the same 
testing procedures for vectors of random variables, where the alternatives may have various 
combinatorial structures. In particular, they consider the example of detecting a clique 
of a certain size in a graph and they compute upper and lower bounds for the Bayesian 
test error. A bipartite graph of size (A^, M) is a graph having edges only between the N 
vertices of one set to the M vertices of a second set. A biclique is a complete bi-partite 
subgraph of size (n, m), i.e. a subgraph where all n vertices from the first set are connected 
to the m vertices from the second set. We consider the problem of detecting a biclique. 
Our results are sharp minimax and adaptive to the size of the unknown biclique. 

The plan of the paper is as follows. In Section 12.11 we give the test procedures. We 
state the conditions on the detection boundary a such that distinguishability is possible. 
Under mild additional assumptions we give the conditions on a so that the alternative is 
indistinguishable from the null hypothesis. 

In Section [2]2l we consider the adaptive setup where (n, m) is unknown but belongs to 
some collection of sequences K, nm ■ We compute the adaptive rates of testing of a slightly 
modified test procedure. 

We include in Section [2^ discussions of previously studied alternatives: subsets without 
structure and rectangular submatrices. The first case can be assimilated to detection of 
a sparse signal in vector observations of length x M, so the set of alternatives and the 
detection boundary are much larger than in our case. We summarize well-known results 
by Ingster [6], Ingster and Suslina [7] and Donoho and Jin [5j. The second case is the 
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detection of rectangles in the large matrix (connected submatrices) , which constitutes a 
set of alternatives smaller than ours. This case is studied in Arias-Castro et al. [2] among 
these for other geometric shapes of clusters. In order to be self-contained we state and 
prove sharp upper and lower bounds, for the rectangular clusters. 

In Section [3l we perform a numerical study of the procedures that attain the sharp 
upper bounds. In order to compute the scan statistic a heuristic stochastic algorithm 
from Shabalin et al. [TO] is used. The empirical detection boundary is very close to the 
one predicted by our results. 

In Section H] we give extensions of our results to Gaussian variables of unknown variance 
cr^, to non Gaussian matrices with distribution in an exponential family and to two-sided 
tests for Gaussian matrices, respectively. 

Section[S]is mainly concerned with the proof of the lower bounds stated in Section [2.1.3[ 
The Appendix contains the proofs of the other results of the paper. 

2 Main results 

We denote by n = unm, fn = tunm and a = a^^M- We recall that asymptotics are taken 
as 

N —7- oo, M — 7- oo, n — 7- oo, m — )• oo. 
Denote p = n/N, q = m/M. We suppose, moreover, that 

p^O, g^O. (2.1) 
2.1 Known size of the submatrix 

In a minimax setup, we suppose that for each M we know n and m. 

Let us consider two test procedures, one based on a linear statistic ip^jf and the other 
based on a scan statistic ■ipjp"'^. The final test procedure ^jJ* will reject as soon as at least 
one of them rejects the null hypothesis. 

2.1.1 Linear statistic 

The first test procedure V'^" is based on the linear statistic 

One easily gets the following non-asymptotic result. 
Proposition 2.1 We have, for any real number H , 

where $ denotes the cumulative distribution function of a standard Gaussian random vari- 
able. 
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Proof. Observe that the statistic tun is standard Gaussian under Pq which yields the 
relation for a{il^''^). Also tun ~ M{hs(j, 1) under -probability, Sc G Snm,a, where 

1 V - anni 

This yields the relation I3{'tp''jp, S) < ^{H—a^mnpq), V S G Snm,a and the same inequality 
for Pnm,a{'ipH"')- Proposition 12.11 follows. □ 

2.1.2 Scan statistic 

The second test ipjp'^^ is based on the maximal sum over all submatrices. Put 

Yc = ^ J2 ^^J' (2-2) 



and 



tma. = max Yc, = Iw>T_, (2.3) 

where Tnm = y/2log{Gnm), Gnm = #{Cnm) = (^) (^) • 

The computation of this statistic is discussed in Section [3l 
Proposition 2.2 Assume I12.1\) . Then a{i{j^°'^) and 

Proof is given in Section 16.11 
2.1.3 Sharp minimax rates 

The following theorem gives necessary conditions for the detection boundary a such that 
distinguishability holds. The test procedure which attains these bounds is 

V'*=max{Vt^,V'r"}, 

for properly chosen H and Tnm- 

Theorem 2.1 Upper bounds. Assume i2. 1\) and let a be such that at least one of the 
following conditions hold 

a^nmpq — )• 00 (2-4) 

or 

V ■ c a^nm 

limmt — — — - — -r- — - — --- > 1. (2.5) 

l(niog{p^^) + mlog(g^^j) 



Then ip* with H ^ 00, H < ca^nmpq, c < 1, and with Tnm = V^^^ogGmn, G„ 

in) ■ (m) ■5'"^^ ^^"^ 7nm,a(V'*) 0. 
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Proof By Proposition I2.H if oo then a{tp^^^) = ^{-H) and if H < 



cay/nmpq, 

for C < 1. Thus Tnm.aCV-H ) ^ 0. 

By Proposition 12. 2( a{ip^'^^) — )• 0. It is not hard to check that under (12.1 



log(G„„) ~ (nlog(p-i) +mlog(g-i)). (2.6) 

Thus, under (l23]> . 

/ ^ rj. f a^/nm{l + o{l)) \ 

a-^nm — Inm = J^nm \ — = — 1 — )■ CX) 

y y^2(nlog(p-i) +mlog(g-i)) J 

and this imphes that 'jnm,a 

Theorem 2.2 Lower bounds. Assume i2.1\) and 



log log (ff ^) ^ ^ loglog(g ^) ^ ^ ^2 
log(g-i) ' log(p-i) 

Moreover, assume that 

ralog(p~^) X mlog(g^^), (2.8) 
anrf i/iai i/ie following two conditions are satisfied: 

a^nmpq — )• (2-9) 



and 



a^nm 



limsup— — ; — — , < 1. (2-10) 

2(nlog(p ^)+mlog(g ^)) 

Then the distinguishability is impossible, i.e., ^nm,,a — > 1 and /3nm,a,a — >• 1 — a for any 
a E (0,1). 

Proof is given in Section [5j 

These results for the upper and the lower bounds can be interpreted as follows. Under 
rather mild conditions (|2.ip . (|2.7p and (|2.8p . the relations 

a^nmpq>il, a'^nm ^ 2{nlog{p^^) + mlog{q^^)) (2-11) 

define a sharp detection boundary in the problem with known {n,m). Note that the 
detection boundary can be written as 



a = mm 



{1 /2(nlog(p^i) +mlog(g~i)) 

y/nmpq ' 



nm 
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2.2 Adaptation to the size of the submatrix 

If the size of the submatrix is unknown, we suppose that, for each N and M, there is a 
set ICnm containing the unknown pair {n,m), such that 

/I 1 n ■m\ 
sup - + - + T7 + XF ^ 0, 

as N, M — 7- oo. This imphes that 

-P f4^^ + 4^Uo, asiV,M^oo. (2.12) 



(n,m)eK; JVM V^log(P ^) mlog(g-i) 

The adaptive test procedure is V'atm = ^^^{i'^tP ^ '4'nm} ' where is the hnear 
statistic defined in Section 12.11 and ip^]^ is a modified version of ^^"^ defined as foUows. 
Set 



Vnm = ^/2log{N MGnm), tNM,m,ax = max max Ic/Km, i>NM = ^tjVAf.max >1 • 

Proposition 2.3 Assume 112. We have a(V'7VM) ~^ ^'^^ 

max {^nm — ^nm \/ ) . 
(n.,m)e/CjvAf 

Proof is given in Section 16.11 

Theorem 2.3 Let the set K-nm be such that condition \2.12(l holds. 

Upper bounds. Let a = a^r^vf = {^nm, {n,m) € ICnm} be such that at least one of 
the following conditions hold 

min a^^nmpq — )• oo. (2.13) 

(n,m)eACjvM 

or 

2 

liminf min «n>n^^ ^ ^2.14) 

(n,m)e/CjvM 2(nlog(p-^) + mlog(g ^)) 

Then, 'yNM,a{'^*NM) 0- 

Lower bounds. Suppose that for each N, M there exists {n*,m*) S KLmm such that 

loglog(jV/n*) ^ ^ loglog(Af/m*) ^ ^ 
log(M/m*) ' log(A^/n*) 

and t/iat 

n* \og{N/n*) X m* log(M/m*), 
as N ^ oo and M — oo. Let a = aArjvf = {anm, (n, m) £ K,nm} be such that 

an*m*n m p q -^0 

and 

Y a^*^*n tn ^ ^ 

2{n* log(p*~^) + m* log(g'*~-'^)) 
Then jNM,a — ^ 1 and fiNM,a,a — )• 1 — a for any a € (0, 1). 
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Proof. Note that the test does not depend on n, m. Therefore for distinguisha- 
bihty in the adaptive problem it is sufficient to assume (j2.13p (which is a uniform version 

of ([23])). We have ^nmA'^h"') ^ 0- 
By (j2.6p . we have 

min (anrnVrim) - Vnm) 

in,m)eK.NM 

( a„„V^(l + o(l)) /~ \og{NM) \ 

= mm Inm — . = — W 1 H -. TT 7 TT- 

(n,m)(iK:MM \^j2{n\og{p-^) +m\og{q-^)) \j nlog{p-^) + mlog{q~'^) J 

which goes to infinity under ()2.12p and (j2.14p . Thus, by Proposition 12.31 we have 
InmA^nm) ^ 0. 

The lower bounds in the adaptive setup are an obvious consequence of Theorem 12.21 

□ 

Remark 2.1 We can state the alternative hypothesis in a more general form: 
Cnm such that Sij — 0, if C, and ^ ^ Sij > anm. 

Indeed, our probabilities of error depend on the elements of the submatrix C only through 
the sum of its elements. Therefore, the previous test procedure will attain the same rates 
and the same lower bound techniques will give the previous results for this more general 
test problem. 

2.3 Related testing problems 

Let us consider two related testing problems under the model (jl.ip and the null hypothesis 
2.3.1 Subsets without structure 

Let Vk consists of all subsets D C {1, . . . , N} x {!,..., M} of cardinality #(-D) = k and 
let k = nm. Let us consider the alternative 

Hi : 3 D £ Vnm such that Sij = if ^ D, and Sij > a if G D (2.15) 

(we do not suppose that the set D is of product structure). Clearly we can consider 
the matrix {Yij} as a vector of dimension P = NM, and the problem is well studied as 
P — )• oo, see Ingster [6], Ingster and Suslina [7], Donoho and Jin [5j. 

The results are as follows. Let k = P'^'^, /? e (0,1). First, let /3 < 1/2 which 
corresponds to k'^ = 0{P), i.e. inm)'^ = 0{NM). Then the detection boundary is 
determined by the first condition in ()2.1ip . It means that distinguishability is impossible 
when a^nmpq — >• 0. On the other hand, if o?nmpq — t- oo, then distinguishability is provided 
by the tests of the type V'h"- 
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Let P £ (1/2, 1). Then the detection boundary is determined by the relation 



a* ~ ^(/3)Vlog(P) = ipil3)y/log{NM), 

where 

m-- 



VW^, l/2</3<3/4, log(nm) 



V2{l-VT^), 3/4</3<l, log(™) 



This means that, if hmsup a/((/7(/3) Y^log(A^M)) < 1, then distinguishabihty is impossible, 
and if hminf a/(99(/3)Y^log(A^M)) > 1, then distinguishabihty is provided by the "high 
criticism" tests ip^'-^ = l[|2,^p>/^} based on statistics 

Lit) = , — , Lhc = T^SixLU), to > 0, 



with H = Y^clog log(A^M), c > 2. 
2.3.2 Rectangular submatrices 

Let £nm consist of all rectangles of size n x m, i.e., of the sets E^i = {A; + 1, . . . A; + n} x 
{/ + 1, . . . / + m}, < k < N — n, < I < M — m, and the alternative is of the form 

Hi : 3 E £ £nm such that Sjj = if ^ E, and Sjj > a if G £'. (2.16) 

Similar problems were studied recently in Arias-Castro et al. \2\ for other related 
geometrically-shaped clusters. Nevertheless, we give here the proof, since the results of 
Arias-Castro et al. [2] cover only the square matrices in this particular setting. 
The detection boundary for (j2.16p is determined by 



2(log(ff~i)+log(g-i)) 
nm 

Let us consider the test ipz based on the scan statistic over a particular set of possible 
rectangles, which is a suitable "grid" on £nm constructed as follows. 

Take rjnm = i] > 0. Put rik = {k — l)nri, k = 1,...,K, nii = {I — l)mri, I = 1,...,L, 
where K, L are such that — n{l + r]) < nx < N — n, M — m{l + ry) < rriL < M — m, 
which yield K ~ N/{rin), L ~ M/{r]m). Put 



EYij, Z= max Z^i, il)z = ^r 



Inm " ^<k<K, 1<1<L ^>\/2 ^og{KL)' 



In this construction, we considered only K x L rectangles: Enf,mi for k = \, ...,K and 
I = 1,...,L. Thus, we scan over a number of rectangles which is much smaller than the 
cardinality of £nm (for technical reasons) and which is also much larger than the set of 
non overlapping rectangles (this set would not be large enough). 
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Theorem 2.4 Assume 1^2. Then 
Upper bounds. Let 

lim inf 



a^nm 



2(log(p-i)+log(g-i)) 



> 1, 



and r] = r]nm is taken in such way that 77 — )• 0, nr/ — )■ c«, mr] — t- 00, | log(r/)| = o(| log{pq)\). 
Then jum^ai'ipz) — ^ for the test procedure ipz previously described. 
Lower bounds. Let 



lim sup 



a^nm 



2(log(p~i) + log(g-i)) 
Then -frim,a 1, /3nm,a,a ^ I - a for any a £ (0, 1). 
Proof is given in Appendix, Section 16.61 



< 1. 



Note that, the separation rates, i.e., the asymptotics of a that provide distinguisha- 
bihty for the alternative ()1.4p . are intermediate between the fast separation rates for the 
alternative (I2.16j) and the slow rates for the alternative without structure (|2.15p . 

Let us consider the particular case of squared matrices {N = M) and squared subma- 
trices (n = m) such that n = N^~^ for some (3 £ (0,1). The sharp asymptotic rates of 
the detection boundaries can be compared in Table 1. 



Rates 


No structure (|2.15j) 


Submatrix (11. 4p 


Rectangles (l2.16jl 


/? e (0, i] 












iV-(i-'3)/2y/4/3iog(Ar) 


Ar-(i-W^4/31og(iV) 


/3g (i,i) 









Table 1: Table of sharp asymptotic rates of the detection boundary a* for squared matrices 
and n = N^-<^ 



3 Simulations 

We have implemented the testing procedure tp* = maxj?/;^", ^z;™"^^} on synthetic data. 
While the linear procedure is rather obvious, the computation of the statistic tmax = 
maxcgCnm Yc is done by using the heuristic algorithm introduced and studied empirically 
by Shabalin et al. [10]. This algorithm is also implemented and studied by Sun and 
Nobel [9] with good empirical results. 

Let us briefly recall this algorithm: we choose randomly a set of n rows out of A^. 
Then, we sum in every column the elements of the previously selected rows. We select 
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now the columns corresponding to the m largest sums obtained in this way. We sum, 
next, in every row the elements belonging to the selected columns and select the rows 
corresponding to the n largest sums. We repeat the algorithm until the sum of elements 
Yij of the selected submatrix does not increase anymore. As the procedure can stop at 
a local maximum, we repeat the procedure K times, where K is large (in our simulation 
K = 10000). We take the maximum value of the outputs. This replication is needed to 
enforce that with high probability the output approaches the global maximum. 

We have simulated matrices of size N xM of i.i.d. standard Gaussian random variables 
for iV = M = 200 and = M = 500. 

We calibrated the test statistics ■i/'^" and -0^"^ in such a way that the first-type error 
occurs with probability ^ 1%- This calibration is done by using the Gaussian 

quantile H = 2.3262 for -0^" and the empirical quantile (out of 100 samples) for ipjp"-^. 

Then, we have added the value a > to the elements of the upper left submatrix of 
size n X m. From resulting observations, we compute tp* = max{i/^^", 0^^"^}. We repeat 
the test L = 100 times and average the values of the test procedure ip* . Denote by tp* this 
average and note that 1 — ip* estimates the second-type error probability. 

We plot the estimated second-type error probabilities for different values of a in the 
neighborhood of the detection boundary predicted by our theorems, for different values 
of n and m. The results in Figure 1 correspond to = M = 200, while in Figure 2 to 
N = M = 500. 




Figure 1: Estimated second-type error probability for fixed a = 1%, detection boundary 
a*, N = M = 200 

Figures 1 and 2 show that the empirical detection boundary is very close to a* which 
is predicted by out theoretical results. Indeed, the second-type error probability is close to 
0.5 at some point close to a* . The plots also show very fast decay of this probability on a 
small vicinity of a* . This means that the test is very powerfull above a small neighbourhood 
of the detection boundary a*. Note also that, for fixed A^ and M, a* decreases to as n 
and m increase. 
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Figure 2: Estimated second-type error probability for fixed a = 1%, detection boundary 
o*, iV = M = 500 



4 Extensions 

We extend our results in different directions. First, we consider matrices of i.i.d. random 
variables having Gaussian law with unknown variance o", second, random variables having 
a distribution belonging to the exponential family (not necessarily Gaussian) and, third, 
test problem with two-sided alternative for the Gaussian matrices. 

4.1 Extension to Gaussian variables with unknown variance 

Sharp results in Theorems 12.11 and 12.21 still hold if the random variables Yij have unknown 
variance cr, under a mild additional assumption. We sketch here the test procedure and 
proof of the upper bounds. 

We estimate the unknown variance cj^ of our data by cj^, where 



This estimator is unbiased under the null hypothesis, but biased under the alternative. 
We replace Yij by lij/cr in the test procedure V'*- We denote by = Uinl^-, tmax 




tmax/d- and put 



ip* = maxjllj- 



max 
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Theorem 4.1 Assume Ii2.1\) . We suppose that 

max — ^ — = o(l). (4.1) 

If a is such that one of the following conditions hold 

a . ^. . _ ay/rvm 

— ^Jnmpq oo or limmi — ^^^^^^^^^=^^^^^^^^= > 1 

o" Y^2cr(nlog(p~^) + mlog(g^^)) 

then ip* , with H ^ oo such that H < 0{{a'^nmpq/a'^)^^'^) and with Tnm = 
^J {2 + 5) log( (^) (^) ) for some 5 > Q small enough, is such that ^nm.,a{'^*) — ^ 0. 

Proof is given in Section 16. 7[ 

4.2 Extension to general law from an exponential family 

In many applications, we do not have Gaussian observations. Instead, we have observations 
Xij, i.i.d. with probability density go^. from an exponential family, for alH = 1, A^, and 
j = 1, M. We explain here how to use the previous testing procedures in order to deal 
with such setups and check that results similar to the case of Gaussian variables hold in 
this case. 

We assume that the laws belong to an exponential family in the general form 

g,(x) = e^(^>^(^)-^(^)/i(x), ^ee, (4.2) 

for the dominating measure where rj is supposed 2 times continuously differentiable and 
strictly increasing on 0, i.e. rj'{9) > 0. Recall that 

c(.)^iog(/.^<».-<..,<.)M..)). 

We consider a point 6^ interior to G and test the following hypotheses: 
Hq : Oij = for all i = l,...,N, j = 1,...,M 
against the alternative 

Hi:3C e Cnm such that % = 9° if {i,j) C and 6^ - 6^ > d ii {i,j) G C. (4.3) 

Let us write model ()4.2p in the canonical form: gri{x) = exp^rj • T{x) — B{'q))h{x), 
where rj = r]{9) and B{r]) = log (J e'^^^^^h{x)fi{dx)) . Note that, if A9 -.= 9 — 9^ is small, 
then A?? := r/ - 7/° ~ r]'{9^)A9 and Ar] > 0. 

Now, let us change the variable and put Y = {T{X) — m^)/a^, with = E^o{T{X)) 
and £7° = y^Var^o{T{X)), having density 

fs{y) = exp(s • y - A{s))h{y), 
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where s = rjaQ and A{s) = B{s/a^) — snio/aQ. Here, we have A'{s^) = 0, A"{s^) = 1 and 

+ As) - A{s^) ~ ^—^ and A'{s^ + As) ~ (As), as As ^ 0. (4.4) 

The original problem corresponds to testing the null hypothesis Sjj = for all (i,j) 
against the alternative that, for some submatrix C G Cnm-, Sij — > As = a^Arj if and 
only if (z, j) G C. Set a = As. 

We have the following results for exponential models. 

Theorem 4.2 Assume \2.1\) . We suppose that 

^og(P~^) + log(g~^) ^ 0. (4.5) 
m n 

Upper bounds. If a is such that one of the following conditions hold 

.,, s , . „ A\s^ + a)Jrmi 

A [s + aj^nmpq —)• oo or limmt — = > 1 

^y2{nlog{p^^) + mlog(g^^)) 

then ip* , with if — )• oo such that H < cA'{s^ + a) ^nmpq for some < c < 1 and with 

Tnm = \J {'^ + log((^) (m) ) /''^ some 6 > small enough, is such that 'jnm,a{'(p*) — ^ 0. 

Lower bounds. Assume, moreover, that conditions {2.1^ and h2. ^ hold. If a is 
such that the following two conditions are satisfied: 

aJnm 



a^nmpq — )• and limsup < 1, 

1^2(71 log(p^i) +mlog(g~^)) 

then -inm,a 1 and f3nm,a,a I - a for any a £ (0, 1). 



Proof of the upper bounds is given in Section] 

The proof of the lower bounds uses the relation (j4.4p and follows exactly the same 
lines as the proof of Theorem 12.21 in Section 12.1.31 except that we have to consider T^^ ~ 
(2 + 6){klog{p^^) + llog{q^^)) for some small 6 > instead of thresholds in ()5.3p . □ 



Under the assumption (j4.5p . the detection boundary a* — )• 0. Therefore, 

A'{s^ + a*) r^a* r^{r]- r]^)a° ~ r]'{e'^)a^d\ 

as d* — )■ 0. It is well known that the Fisher information at 6^ in model (j4.2p is 
-^(^o) = {cF^ rj' {6^))^ . In this way, we deduce the sharp asymptotic detection boundary 
for alternative (|4.3p from Theorem 14.21 d* = a* / \fT{W). Examples of such calculations 
for most popular probability distributions in the exponential family are given in Table 2. 
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Probability law 








^1(00) = a%'{e^) 


Poisson((9), e>0 


\ogie) 




VP 


(0O)-i/2 


Ber(e), < < 1) 


log(T^) 




v/0O(l - 00) 


(0O(l-0O))-l/2 


Exp(0), e>o 




00 


00 


(0O)-i 




1 


(00)2 


2(00)2 


2(0O)-i 



Table 2: Examples of calculations for testing in general exponential families 
4.3 Extension to two-sided alternative 

Let us consider model (jl.ll) and the same null hypothesis ()1.2p . against the two-sided 
alternative: 

Hi : 3C S Cnm such that Sij = 0, if C and \sij\ > o, if G C. 
Let us consider the following test procedures 

and 

^max = max Zc, where Zc = — V {Y^j-l), and V^ax = 2^ma.>T- 
C^eCnm \/2nm 

Theorem 4.3 Assume 12. We suppose that |^.5D holds. 

Upper bounds. // a is such that one of the following conditions hold 

2 / 1- • r a^^/mn 

a ^Jnmpq —)■ oo or hm mi — -^^^^^^=^^^^^^= > 1 

2-y/nlog(p^i) + mlog(g^^) 

then ip^ = max{V'fj„, 'i/'max} '"'^^^ H ^ oo such that H < ca"^ /2y/nmpq for some < 
c < 1 and wii/i Tnm = y^(2 + 5) log((^) (^)) /or some 5 > small enough, is such that 



Lower bounds. Assume, moreover, that conditions (2.1) and \2. ^) hold. If a is 
such that the following two conditions are satisfied: 



a ^nmpq — )• and limsup — -j^^^^^^=^^^^^^= < 1, 

2ynlog(p^i) +mlog(g^^) 

then 7„m,a 1 and I3nm,a,a I - a for any a £ (0, 1). 
Proof is given in Section 16.91 

5 Proof of Theorem [23] 

In the first part, we give the proof of the theorem and the other parts of this section are 
dedicated to proofs of intermediate results. More lemmas are in the Appendix. 
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5.1 Prior and truncated likelihood ratio 

Let Sc = {sij} be the matrix such that Sij = 0, ^ C, Sij = a, G C. Let us 

consider the prior on the set of matrices: 

nm 

and let P^^ be the mixture of hkelihoods P-,^ = G^^YliC&Cnm^^c- consider the 

Ukehhood ratio 

L.{Y) = '^^{Y) = G-^ E ^(^) = ^"' E exp(-6V2 + 61c), 

nm c&c nm 

here and below we set 52 = a'^nm, and, for submatrix G of the size nxm, the statistics Yc 
are defined by ()2.2p . Since 7r(5„m) = 1, in order to obtain indistinguishability: 7nm,a 
1, Pnm,a,a — >• 1 — a, V Q G (0, 1), it suffices to show 



L^(y) -^1 in Po - probability. (5.1) 



Indeed, 



7nm,a = mf sup (a(-0) + 5") ) 

>- E (^;oW>')) + ^o[(i-*(i-))^(y) 

nm.a 

> inf (i?o(V'(5^)) + i^o [(1 - ^P{Y))LAY)]) 

> i?o(r(n)+^o[(l-r(n)^^(n], 

where ip*{Y) = 'K^^(^y)>i is the likelihood ratio test. Therefore (|5.1|) implies by Fatou 
lemma that 

hminf 7„^,, > Eo [liminf (i?o(V'*(^)) + ^o[(l " , 

i-e. 7nm,a — )• 1- It is easy to deduce that f3nm,a,a — )• 1 — a. 

Let us replace the statistics -^7^(1^) by their truncated version 

where the events Zq are determined as follows. Set 



Tki = ^2(log(GfeO + log(nm)) ^ oo. 

Take small (^i > (which will be specified later) and set kQ = 5in, Iq = 6im. Let 
Cki,c = {V £ Cki : V C G} be the sub-matrices of C G C 

nm which are in C^i . Then we 

set 

Zc= n n {Yv<ni}. (5.2) 

ko<k<n, lol^l^nT'VSiCicic 
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By (12. 6p . under conditions on k, I in <\5.2h (and similarly to the equivalent of T^^) have 

Tii^2{klog{p-') + nog{q-')). (5.3) 

Indeed, when looking at second-order moments of the likelihood ratio L^(y) a large 
contribution comes from overlapping submatrices inducing correlated random variables 

and Yc2- Then V is the submatrix where Ci and C2 overlap. Our idea is to truncate 
Yy for those matrices which overlap significantly {6in < k < n and 6im < / < m), hence 
the likelihood L.,^{Y). 



Proposition 5.1 Set Z^m = PlceCnm under the assumptions of Theorem \2.!^ 



Proof is given in Appendix, Section 16.21 
Proposition 15.11 yields 

Po(^L^{Y) = L^{Y)] ^1, 
and in place of ()5.ip it suffices to check that 



L^{Y) ^1 in Po - probability. (5.4) 
In order to get ()5.4p it suffices to verify two relations: 
Proposition 5.2 Under the assumptions of Theorem \2.^ we have 

Eo{L^) ^ 1. 



Proof is given in Appendix, Section | 

This Proposition together with the fact that 

Eo{Ll)<l + oil) (5.5) 

imply that 

Eo{L^ - if = {e^{lI) - 1) - 2 (Eo{L^) - 1) < 0(1) 

which ends the proof of the theorem. 

The remaining part of this section is devoted to obtaining the relation ()5.5p . 

5.2 Second order moment of the truncated likelihood ratio 

Let us prove the relation ()5.5p . We have 

E^{Ll) = Eo (exp(-62 + 6(y^^ + YcM^Za^nZa,}) ■ (5-6) 

C2&C nm 

Let Ci = Ai X Bi, C2 = ^2 X -B2 and set 

k = #{AinA2), l = #{BinB2), V = {AinA2)x{BinB2), pki = kl/nm. 
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Denote 

k2 



gik, I) = Eo i^expi-b' + b(Yc, + Yc,))li{Zc,nZc,} ^ 

In view of symmetry we can fix Ci, say Ci = {1, . . . n} x {1, . . . m}, and take the sums 
over C2 only. Therefore we have 

Eo{Ll) = G-' g{k,l). (5.7) 

C2ec nm 

Set z'^i = a'^kl, pki = kl/nm. 

Lemma 5.1 The following inequalities hold true. 

(1) We have 

g{k,l)<Eo{exp{-b^ + b{Yc,+Yc2)))=eM4i)=9i{k,l). (5.8) 

(2) Letb>Tnm/{l + Pki)- Then 

g{k,l) < Eoi^eM-b" + h{Yc,+Yc,))1I{Yc 

< exv(-{Tn„,-bf + ^;^\^g2{Kl). (5.9) 

(3) Let k > 5in^ I > 6im, and T^i < 2x^1. Then 

g{Kl) < Eo(eM-h'' + h{Yc,+Yc,))a{y^^^^T,,} 

= eMT^i/^-iTki-ZMf)=93{k,l). (5.10) 



Proof of Lemma 15.11 is given in Appendix, Section E 
5.2.1 From hypergeometric to binomial distributions 

Observe that the right-hand side of (j5.7p is the expectation of g{Xi , X2) over Xi , X2 which 
are independent and having hypergeometric distributions TiGi = 'HG{N,n,n), I-LQ2 = 
T-lQ{M,m,m) respectively, i.e., 

Yl 9{k,l)= Eng.xHG AXi^X2). (5.11) 

nm 

Let us compare random variables X having hypergeometric distributions HG = 
T-LG{N,n,n) and binomial distribution Bin = Bin{n,p), p = n/{N — n). 

Lemma 5.2 Under binomial distribution, X is stochastically larger, than under hyperge- 
ometric distributions, i.e. for any x gW, 

PngiX >X) <PBin{X >x). 

This yields, for any non- decreasing function g{x), 

EHg{9{X)) < EbMX)). 
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Proof. The first claim corresponds to Lemma 3 in Arias-Castro et al. [3j. The second 
claim follows from the Abel's transform of the series for the expectation. □ 

Let Pn,p{k) = PBini^ = k), for some integer k, where X has binomial Bin{n,p) 
distribution, and similarly PN,n,n{k) = Png{X = k) for hypergeometric distributions 
ng{N,n,n) ofX. 

Lemma 5.3 Let n — t- oo, p — t- 0, p > 0, k > n/r{p) where r{p) > 1 for p > small 
enough, and log{r{p)) = o{log(j)^^)). Then 

log(P„,p(A;)) < A:log(p)(l + o(l)). 

Proof is given in Appendix, Section 16.51 

Since p ^ p, these imply the following lemma. 

Lemma 5.4 Under assumption of Lemma \5.3l 

log(P7V,n,„(fe)) < fclog(p)(l + o(l)). 

Proof. In view of Lemmas 15. 2( 15.31 we have 

PN,n,n{k) < Png{Z >k)< PB^n{Z > fe) ~ Pn^k). □ 

5.2.2 Evaluation of the expectation 

Take any small 5 > 0. It suffices to consider the case 

52 = a'^nm ~ (2 - 5){n\og{p~^) +m\og{q~^)). (5.12) 

This implies 

X + (5.13) 

m n 

In order to evaluate the right-hand side of (jS.lip . let us firstly divide the expectation 
into 2 parts E-ug^y^-ug^{g{Xi,X2)) = Ei + E2, where 

El = EngixUGiidiXl, X2)'Iixia2<l), E2 = E-ug^xUGiidiXi, X2)'iixia'^>l)- 

We would like to show that -Ei < 1 + o(l) and E2 = o(l). 
Evaluation of Ei 

It follows from (1533]) and I^B) that Xi = 0{n/ log{q-'^)) under a^Xi < 1. By ^ 
we have 

El < £;^g,xwg2(exp(a2XiX2)]Ixia2<i) = Eng, {Eng2 {exp{a'^ X1X2)) ]Ixia2<i) , 
In view of Lemma 15.21 for binomial Bin2 = Bin{m,q), q = m/{M — m), 

Eng, (exp(a2XiX2)) < Emu, {eMa^XiX2)) = (1 + g(e'^'^i - 1))™ 

< exp(mg(e"'^i - 1)), 
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Observe that, for some B > under constraint Xio^ < 1, 

exp{mq{e°-^^^ - 1)) < exp{Bmqa^Xi) 
Taking the expectation over Xi we similarly get 

El < Eng,{eMBmqa^Xi)) 

< EB^nA^MBmqa^Xl)) = (1 + p(e^"^«"'^' - 1))" 

< exp(np(e^'"'"'' - 1)). 

By (I5.13P and (I2.8p . we have a^m x log(p~^). By condition (I2.7p . we have qlog{p~^) = 
o(l), which yields mqa? = o(l). Thus, np{e^"^'^"'^ — 1) ~ Bnpmqa? = o(l) by the condition 
(|2.9p . and we get 

El <exp(o(l)) = l + o(l). 

Evaluation of E2 

In order to evaluate E2 take 5i > small enough such that 5ia?"m < log(p^^)/2 and 
Sia'^n < log(g~^)/2 (one can do it by the conditions (j5.13p and (|2.8p ). Divide E2 into two 
parts E2 = E21 + E22, where 

£'21 = Ey^g-^x'Hg2i9{^l: ^2)^Xia'2>l, Xi/rrKSi), 
E22 = Eng^xHg2i9i^l^ ^2)'^Xia'2>l, X2/m>Si)- 

Evaluation of £^21 

Recalling that Xi > Bn/ log{q^^) with some B > 0, under a'^Xi > 1, and 

logr(p) = log(log((7~^)) = o(log(p~^)) by (]2.7p . Applying Lemma 15.41 and since 
PM,m,mil) < 1, we get 

E2I < ^ eX.p{a^kl)PN,n,n{f^)PM,m,m{l) 

n>fc>bn/ log{p-l), 0</<5im 

< Yl exp(fc(a2;-log(p-i)(l + o(l)))). 

n>k>bn/ log(p-i), 0<l<5im 

Observe that under the constraints in the sum, 

a'^l < bia^m < log(p~^)(l/2 + o(l)), 

which yields for the power in the exponent is 

A;(a2/-log(p-^)(l + o(l)) < -A: log(p-i)(l/2 + o(l)) 

< -a-^log(p"^)(l/2 + o(l)) X m. 

Therefore we have £'21 ^ nmexp{—Bm) = o(l) for some B > hy condition (12. 7p . 
Evaluation of £^22 
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In order to evaluate the item £'22 we divide it in two parts as well: £'22 = Ii + l2i 



h — ^■HgixWg2(5(-'^l>-'^2)I[xia2>l, Xi/n«5i, X2/m>5i)) 
h = E'ngixHg2i9{Xl,X2)'iixi/n>5i, X2/m>Si)- 

The evaluation of Ii is similar to the evaluation of £'21 and we get Ii = o(l). 
Evaluation of I2 

Let us divide the set Ti of values {k, I) of (Xi, X2) constraints for the Xi,X2 in I2 into 
two parts: 

Til = {{k,l) G n : /clog(p-i) + nog(g-i) > 2pM{nlog{p-^) + m\og{q-^))} , 

n2 = {{k,l)(^n: k\og{p'^) + l\og{q-^) < 2pM{nlog{p-^) + mlog{q-^))} 

This yields the division of I2 into I2 = I12 + -^22- Observe that pki > Sf for {k, I) £ H. 

Let us consider /12. Recalling (j5.9p . observe that we can take 5 > small enough 
in (j5.12p such that t = Tnm — i>{l + pki) < 0. Applying ()5.9p and Lemma 15.41 for 

PN,n,n{^), PM,m,m{l), We get 

Ii2< expf-(r„„-6)2 + ^L^_Hog(p-i)-/log(g-i) + o(r2jy 

Observe that for 6 > small enough in (j5.12p one can take 82 = 52{S) > such that 
(Tnm — b)"^ ^ ^2Tnm ^^^^ item in the exponent. Denote 

A = An,p = nlog(p"^); B = Bm,q ~ mlog{q'^) 

and = 21og(G„„) ~ 2{A + B) by 

Set X = k/n, y = l/m. Then the items in the power of the exponent above can be 
rewritten in the form 

^ - Hog(p-) - U„g(,-) = P + ^^'-^ - A. - By, 

1 + Pki l + xy 

whenever the constraint in T-Li are of the form Ax + By > {2 + o{l)){A + B)xy . This yields, 
in 7^1 and for N, M large enough, 

\ + xy \ \ + xy J 

Therefore 

/12 < 2nmexp(-(52 + o(l))r2^) = o(l). 

Consider now the item /22- Recalling (15. 3p . (I5.10p observe that the constraint in 7^2 
correspond to r|; < (2 - 5)zli{l + o(l)) < 4z|;, which implies Tki - 2zki < for N,M 
large enough. 
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Applying (I5.10p and Lemma [5^ for PN,n,n{k), PM,m,m{l) and the inequality (^[—x) < 
exp(— for x > 0, we similarly get 

/22< Yl exp{zl-{Tki-2zMf/'^-iklog{p-^) + llog{q-^)){l + o{l))). 

(fc,i)GW2 

Observe that the power in the exponent is of the form (up to factor (1 + o(l))) 

zli - {Tki - 2zMf/2 - Till = -{zki - Tkif. 

Under (|5.12|) (compare with (|6.ip ) recalling (|6.2|) we see that {zki — Tf^iY ^ ^'i'^ki ^^^^ some 
82 > 0. These yield I22 = o(l). 

Theorem 12.21 follows. □ 



6 Appendix 

6.1 Proof of Propositions [211 and [2^ 

Observe that Yc ~ 7V(0, 1) under Pq and, since Gnm ^ 00 and ^{-T) x exp(-r2/2)/r 
as T — 00, we get 

a(V;r") = Poitmax > Tnm) < ^o(^C > T^m) = GnmH-Tnm) ^ 0. 

C£C, 



Similarly 

Let Sc S Snm,a- Then Yc* ~ -^{gsc 1) under -probability with 

55c ~ (rern,)^^/^ Sij > a\Jnm. 

Observe that 

/3(^r", Sc) = PScitmax < Tnm) < PSciYc < Tnm) = ^{Tnm " 9Sc) 

< ^{Tnm- ay/nm). 

Similarly, 

^{i^NM , Sc) = Psc {tNM,max < 1) < Psc (^C < Km) = HVnm " 9Sc ) 

< ^ {Vnm - anm Vum) . 

Proposition 12.21 follows. □ 



23 



6.2 Proof of Proposition 15. 11 

It suffices to check that Po{Z^^) — t- 0, where A'^ states for the complement of the event 
A. We have 

Km= [J U U {^">^h}= U [J{yu>ni}. 

CaCnm ko<k<n, lo<l<m u£Cki,c ko<k<n, lQ<l<m udCj^i 

Since Yu ~ A/'(0, 1) under Pq) we have, by definition of Tki and using the asymptotics 

~ e~^'^/^/\/27ra;, x oo, 

ko<k<.n, lo<l<Tn udChi ko<k<n, lo<l<Tn 

< " ^ + ^0. 

ko<k<n, lQ<l<m ^ 

Proposition 15.11 follows. □ 

6.3 Proof of Proposition 15.21 

In view of symmetry in C, it suffices to check that, for any fixed C G Cnm^ 

or, equivalently, Psq{Zq) — )• 0. Set z^i = o?kl. Since Yu ~ AA(zfc/,l) under Ps(^ for 
u G Cfc/^c", we have 

Psc{zh)< E E H^ki-ni)= J2 G^HzM-Tki), 

ko<k<n, lo<l<m udQ^^i c ko<k<n, lo<l<m 

where G^" = #(Cfci,c) = C'^C'^- Under assumptions (^1^ and (l2l(l there exists 5 > 
such that 

6^ = a^nm < (2 — (5)(n log(p~"'^) + 'mlog{q~^)). (6.1) 

Let us show that under (j6.ip one has < Tf;(l — ^/2 + o(l)). In fact, since Sin < k < 
n, 5im < I < m, and by (|5.3p . we have 

zli = a^kl < {2-6){k{l/m)\og{p-^)+lik/n)log{q-^)) 

< {2-5){k\og{p-^) + l\og{q-^))r.{l-5/2)Tl,. (6.2) 

Thus we get, for some (^2 > 0, 

HzM-Tki)<e^^{-52Tl). 

Observe now that, under constraints on 5in < k < n, 5im < I < m we have log(G^[") = 
0{n + m). This follows from evaluations similar to the proof of ()2.6p . On the other hand, 
we have T^i ~ 2(A; log(p~^) + / log((7~^)) ^ (n + m) under the same constraints. This yields 

^ GrHzM-ni)< Yl exp(O(n + m)-52r2)^0. 

ko<k<n, lo<l<m ko<k<n, lo<l<m 

Proposition 15.21 follows. □ 
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6.4 Proof of Lemma 15.11 

The first inequalities in ()5.8p - ()5.10p are evident, and we will prove the second ones. The 
proofs are based on the well known relation: if X ~ M{0, 1), then 

E{exp{TX)) = exp(TV2), V r G M. (6.3) 

Let Vi = Ci\C2, V2 = C2\Ci, y = Ci n C2, and observe that the sets Vi,V2,V are 
disjoint, Ci = Vi + V, C2 = V2 + V and #(Vi) = #(V2) = nm - kl, #{V) = kl. 
Let < kl < nm. Let us write the statistics 1^^, in a more convenient form 



Yci = Vl - PkiYv^ + y/pkiYy, Yc^ = a/1 - PkiYv2 + ^/PkiYv, 
where as above, for U C {1, N] x {1, M}, > we set 

Yu = , ^ y Yi^. 

Observe that IVi > Yy^ , Y\/ are standard Gaussian and independent under Pq . 

Recall that h = a^Jrvm and put c = — p^i- It is obvious that = + z^^. 
Moreover, by applying (j6.3p . we get 



= ^0 (exp(-cV2 + cYy,)) ■ Eo (exp(-cV2 + cYy^)) ■ Eq (exp(-4 + 2zmYv)) 
= exp{zli). 

If /c/ = or kl = nm, we can prove this in a similar way. Lemma l5.ll ()5.8p follows. 

In order to get the second inequality, observe that, for < kl < nm and for any h> 0, 

Eo(eM-b^ + HYc,+Yc2)^Yc 
< e-^'+^^-'^i^o (exp((6 - h){Yc, + Yc^))) 

= e-^'+2T_h^^ (^^^p((^ - /i)(l - m)'/'(>Vi + >V.) + 2(6 - /i)/^!^ yy)) 

= exp(-62 + 2Tn^h +{b- hf{l - pki) + 2(6 - hf pki) 

= exp(-62 + 2r„„/i + (6 - hf{l + ph)). 

Taking h = b — Tnm/{^ + Pfc/); we get the second inequality. If kl = or kl = nm, we can 
prove this in a similar way. Lemma l5.ll (j5.9p follows. 

In order to get the third inequality, for < kl < nm and for h >0, we have 

^0 (exp(-62 + 6(yc, + Yc2)nYy<T,,) 

= Eo (exp(-cV2 + cYy )) ■ Eo (exp(-cV2 + cYy^)) ■ Eo (e"^^' exp(2zHn')]Iy^<T,, 

< e-^^'^+^'^^'^^o (exp {{2zki - h)Yy + h{Yy - Tu)) ^Yv<n,) 



< e-'^^+^'^^Eo (exp((2zfc, - h)Yy)) = exp(-z^; + T^/i + {2zki - hY/2). 

Taking h = 2zki — T^i, we get the third inequality. If kl = nm, we can prove this in a 
similar way. Lemma l5.ll (|5.10p follows. □ 
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6.5 Proof of Lemma 15.31 

Recalling Pn,p{k) = C^p'^^l — U k = n, then log(P„,p(n)) = nlog(p). Let k < n. 

Using the Stirling formula and the inequality 1+x < we get, asn— )-oo, A;— )-oo, n—k> 
1, 

^"'P^'^) < fcfc+l/2(„_^)n-fc+l/2 

< 1^!^^ ^ gfc(l+l/2(n-fc))+l/(12n) 

This yields 

log(P„,p(A;)) < k{log{p) + log{n/k) + 0{l)). 

Since n/k > r{p) we see that < log(n/A;) < log(r(p)) = o(log(p^^)) under the assumption 
on r(p). Lemma 15.31 follows. □ 

6.6 Proof of Theorem 12.41 
6.6.1 Proof of the lower bounds 

Let K = [N/n], L = [M/m] and consider only non over-lapping rectangles 

R^l = {(i, j) ■.n{k-l) + l<i<nk, m{l - 1) + 1 < j < ml}, I < k < K, 1 < I < L. 

Let Ski be the matrix with the elements Sij = if ^ Rki and Sij = a if G Rki- 
Consider the prior 

^ K L 

k=l 1=1 

By construction, TT{{Ski, k, I}) = 1. The likelihood ratio is of the form 

u k=i 1=1 ^ k=i 1=1 

where 

Zki = -^= Yij, 6^ = nma^. 

Note that Z^i ~ AA(0, 1) under Pq and are independent in /c, /. It is sufficient to check that 
L(Y) — )■ 1 in Po-pi'obability. Let us consider the truncated likelihood ratio 



K L 



k=l 1=1 

where we set 



Tkl = ^2\og{KL)^ v/2(log(p-i) + log((?-i)). 
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Since 

K L 



Po{L ^L)<Y.Y. ^oiZki > Tkl) ^ 0, 



k=l 1=1 

it suffices to cfieck tfiat L{Y) — )■ 1 in Po-probability. 

Observe now tliat Tkl — 6 — ?• oo under tlie assumptions of Tlieorem, and it suffices to 
consider the case b > cTki for some c G (1/2, 1). We have 



Eo{L{Y)) = ^Y.Y.Eo{eM-b'/'^ + bZki)'iLz,,<T^-,) = HTKL-b)^l, 

k=l 1=1 
^ K L 

Varo(L(y)) = ^ ^ Varo(exp(-6V2 + 6Zh)]Iz,kTk J 

^ k=i 1=1 

^ K L 

^ k=i 1=1 

= eMb'mTKL - 2b) < exp(62 _ [Tkl - 2bf /2 - tIJ2) 
= eM-{TKL-bf)^Q. 

Theorem 12.41 (1) follows. □ 
6.6.2 Proof of the upper bounds 



Set Tkl = \/2\og{KL) and observe that, by the choice of r/ and since pq — )• 0, we have 



Tkl = V2log{NM/nm) - 41og(r/) + o(l) 

= {2log {ipq)-')f' (l + (log(7?)+o(l))/log((pg)-i))'/' 
~ V2(log(p-i)+log(<7-i)). 



For type I errors, we have 

K L 

ai^z) < Y^Y^P^^^ki > Tkl) = KL^{-Tkl) ^ 0. 

k=l 1=1 

Let the alternative Se correspond to the matrix with entry a > at positions in 
E = Ek*i* and elsewhere. As previously, E = Ek*i*, < k* < N — n, < I* < M — m 
consists of {i,j) such that k* < i < k* + n, I* < i < I* + m. By construction, we can take 
/c,/, 1 < A; < i^, 1 < I < L such that \nk — k*\ < nr], |m/ — /*| < mr]. Therefore, the 
matrix Ek*i* will overlap with the matrix En,.mi from our test procedure significantly: 

n = +n}n{nfc + l,..,nfc+n}) >n(l-7?), 

m = # {{I* + 1, I* + m} n {mi + 1, .., + m}) > m(l — r]). 

Observe that 

/3(.^z,Se) < PsAZki<TKL). 
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Moreover, Zki ~ A/'(&, 1) under Ps^ where we recall that b = a^/nm and we put 



anm 



> b{l-r]f ~ b. 



This yields 



/3{^Pz, Se) < HTkl - bil + o(l))) ^ 
under assumptions of Theorem. Theorem 12.41 (2) follows. 



□ 



6.7 Proof of Theorem 14.11 

Let us see that Eo{a^) = and that Varaia'^) = 2a^/{NM). Denote by 



O 



B 



< 



with B -^oo such that B/VNM 0. Then, Po{{ObT) < B'"^ = o(l). 

It is easy to see that Po(i«m > H) < Po{tiin/a > Ha/a, Ob) + Po(.{ObY) = Po{tiin/o- > 
H) + o(l) = o(l), as #2 = H^{1 - B/VnM) oo. 

Similarly, for tmaxi 

we put T^^ = r2^(l - BVNM) and then Po{imax > Tnm) < 
Po{tmax/o- > fnm) + PoUObT) = o(l), for our choice of Tnm- 
Under the alternative, we can decompose 



a^ 



a 



NM 



a 



NM 



did) 



We get 



Esia^) = a\l + G), Varsia^ 



2a'*' 

(l + 2G)andG- , , 

NM^ ' a^NM ^ 

(i,i)GC 



NM Yj 



Denote by R := 5^/^(1 + 2G) and by 



OsB 



a^ 



a^ 



l-G 



<R}, 



with B ^ oo such that B^ = o{a/a\/ nmNM) which implies that R = o{Es{tiin/a)). 
Then, on the one hand PoUOsbY) < = o(l). On the other hand, we have 

Psiiun <H) < Ps{tun/<y < Ha/a, Osb) + Ps((Osb)') 



< Psitiin/(T < HVl + G + R) + o(l) 

< Ps{Es{tHn/a) - tun/a < Es{tun/a) - H^l + G + R) + o(l) 

< {Es{tiin/<y) - HVl + G + R)-^ + o(l). 
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Now, Es{tiin/cF) = (NM) X](j,j)GC •^u/'^ - anm/{a\/NM) = a/a^nmpq. We also 
have H^/l + G + R < H{1 + y/G + VR) = o{Es{tiin/ a)). This proves that Ps{iun < 
H) ^ if Es{tiin/a) ^ ^. 

In order to finish the proof it is enough to see that 

Psii'* = 0) < mm{Ps{tlin < H),Ps{imax < T„„0}, 

which tends to if Es{tun/(y) — oo. 

If Es{tun/(J) = 0(1), we have y/l + G + R = 1 + o(l). It gives 

Psitmax < Tnm) < Ps{tmax < TnmC^ / (T , O s b) + PHOsbY) 

< Ps{t max 

< TnmVT+GTR) + 0(1) = 0(1), 

for our choice of Tnm- ^ 
6.8 Proof of the upper bound of Theorem 14.21 

It follows the same lines as that of Theorem 12.11 We use Markov inequality and bound 
from above exponential moments of our test statistics (as they are not having Gaussian 
distribution in this case). 

We use repeatedly the well known facts that, A'{s^) = and A"[s^) = 1 for centered 
and reduced random variable at s^. Moreover, 

for any s and u such that s ans s + u are interior points of the parameter space. For the 
statistic tiin, we have 

"(V-t^) = Pso{tiinH>H')<e-'''EAe''^-''] 

< e-^\,,^EAe''^^^/^] 

< eMiMs' + ^^)-A{s'))NM-H'). 



For — )• oo as in our theorem H/^jNM 0, then we get A{s^ + ^7=) - A{s 
2^/(1 + 0(1)) and a(V'f') < ce--f^'/2(i-o(i)) ^ ^^^^ constant c> 0. 

Under the alternative, 

^(V'^",5) = P5H/^n + ^>0]<e^i^s[e-*""] 



On the one hand, A{s^ — ^r^jj ) — A{s'^) ~ ~2Wm- other hand, it is easy to 

check that A"{s) > 0. Thus, ^ is a convex function and A' is increasing. This implies 
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A{sij - ^^=) - A{sij) < --^^^A'{sij) which is less or equal than --^^A'{s° + a) under 
the alternative. Finally, 

/3(V'^",5) < exp{H-lil-pq)-A'{s^ + a) ^ 



2^ WNM' 
< cexp{H — A'{s^ + a)y/nmpq) — )• 



under our assumption. 
For the statistic tmax, 



a{ljjx'^^) = Pso{tmax > Trim) < PsO {tmaxTnm > T, 

C 



c c 



1/2 

As T„m/y^nm X (log(p^-'^)/m + log((7^^)/n) —)• 0, we have A (s^ + T^m/ -^/nm) — A 
T^^/ (2nm) and this gives 

For the second-type error, 

/3(Vr'\^) < Ps[-Yc + Tnm>0]<e'^-"^Es[e-''^'] 

< exp I Tnm + [A{Sij - A{sij)] 

V (i,i)ec ^ 

< exp {Tnm — nmA'{s^ + »-) ) — >• 0, 

\ ^nm J 

by the choice of 5 > small enough. □ 

6.9 Proof of Theorem 14731 
6.9.1 Proof of the upper bounds 



We have, under the null hypothesis, znnV^NM has a distribution with EQ{ziin) = 
and VarQ^ziin) = 1. This implies that Po{ziin > H) as H ^ oo. 

For Zmax we will use the moment generating function of the distribution. We have 

Po{Zmax > Tnm) < ^-Po(-^C > Tnm) < GnmPoiTnmZc > T^m) 



C 



\J Inm 

< G„„e-%(i+°«) = 0(1), 
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by the choice of Tnm in our theorem. Indeed, 2T^^/ (nm) = O(a^) — )• 0, by assumption 
(USD. 

Again, j3{il)^,S) < min{P5'(z;j„ < H),Ps{zmax < Tnm)}- Under the alternative, 5 = 
Sc and ziin has mean Es{ziin) = X/VNM and variance Varsiznn) = 1 + 2X/{NM), 
where A = sfj. We have, 

A a^nm 

> —j^=^ > —=^nmpq. 



Therefore, if ^nmpq — )• oo, we have 

Varsizun) 1 + 2\/{NM) 



Ps{zun<H) < 



[Es{zun)-H? (1 - c)2AV(2iVM) 



2 4 

- 7i yr~A ^2"^ =o(l)- 

(1 — cj'^a^nmpq (1 — cj'^a'^nm 



Under the alternative, 

where t = —l/V^nm < 1/2. Therefore, 

^ . -v . , . /At nm , 

Ecjc expft^^y,^) = expf^-^-— log(l-2t) 



exp 



X 1 nm. 2 1 , 

< exp , , ( \ / 

y V2nm 1 + Y^2/(nm) 2 ' V nm nm' 

(a^^nm 1 /nm 1 \ 

^/2 l + y27(^~ V~^2j ■ 

In conclusion, if liminf a^nm/(4(nlog(p^"'^) + m log(g^"'^))) > 1 we have 

-PScl^max < Tnm) < \/eexp r„, 



a^Jnm 1 



\/2 1 + ^21 {nm) 



6.9.2 Proof of the low^er bounds 



We follow the lines of the proof of Theorem 12.21 The prior on the set of matrices is 
TT = ^c&Znm. ^c*' '^^6re, under vrc the matrix S = Sc has Sij = with probability 1 
for all C and Sij is either a and —a with probability 1/2, for all G C. 

Let Ps'p denote the likelihood of the random variables in Y when S = Sc and 
denote the mixture of likelihoods P^^ = X^ceC„m ^Sc- Therefore, the likelihood ratio 
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LJY) is 

LAV) = ^(V) = i E ^(^) 



dPo G ^ dPo 

cec nm 



Cec nm 

^ ^ g-aWexpj ^ log(cosh(ay,,))| . 

CeCnm \(«,j)6C / 



Note that Eo[cosh{aYij)] = e'*'/^ ^nd that Eolcosli^ (aYij)] = (1 + e2'*'')/2. We can re- 
produce the proof of Theorem 12.21 with replaced by a^/2. For example, in (jS.Sp we 
have 

= Eo[exp{-a^nm + ^ log(cosh(ayij)) + ^ log(cosh(ayy)))I[zc^nZc2]- 

Cl C2 



We can show, as in the proof of (|5.8p . that 

9{k,l) < e-'^'"'"So[exp( J] log(cosh(ayj)) + 2j]log(cosh(ay,)))] 



yiuy2 V 

kl 



< e-'''"™e2^("™-'=')^^'[cosh2(ay)] < e'"' 



kl 



1 + e' 



where Vi , V2 and V are defined in the proof of Lemma 15.11 

The relations (15. 9p and (I5.10p could be replaced by the following: 

gik,l) < exp(-{Tn„^-b)^ + ^^^ + o{T^j], (6.4) 



1 + Pfc« 

g{k,l) < exp {Til/2 -{Tki-ZM? + o{Tii)), (6.5) 

where b'^ = nma'^/2, = b'^pki, under the same constraints. The inspection of the proofs 
of ()5.9p and (jS.lOp shows that, in order to prove (|6.4p and (j6.5p . one could use the following 
relation in place of (j6.3p : 



for a and r € E+, r = 0(1). 

In order to prove (|6.6p . we can split the expected value over the events {ra^y^ > 5^} 
and {ra^y^ < 5^} respectively, for some small enough 6 > such that ^ja^fr — )• oo (we 
choose (5 = (ra^)^/^). Firstly, we use the inequality cosh(x) < e^^/^ and get 



ra2y 2 >(52 

oo 

< 2 / e 



|2"5/(aV^) , 1 - ra^ 52 
V TT 1 — ra^ 2 ra^ 
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Secondly, on the event {to^Y^ < d'^} we use the Taylor expansions log(cosh(x)) = — 
xVl2(l + o(l)), = 1 + X + xV2(l + o(l)), x = o(l). Denote C/ = log(cosh(ay)) - 
-E'o(log(cosh(ay))). We have 



£;o(log(cosh(ay))) = ^ - ^(1 + 0(1)), Varo(t/) = y (1 + 



and, since tU = o(l), 

= 1 + 



[1 + TU+ ^(1 + 0(1))) • ]I™2y2<52 



r2Varo(C/) 



:i + rC/ + ^(1 + 0(1))) • ]I™2y2>52 



The last expected value is o(T^a'^) and this gives 

SoK'°s{cosh(ay)) . 2^^^^^^^^] = e-i^O [log cosh(ay)] + 41(1+0(1))^ 

Together with the first relation in (|6.7[) , this ends the proof of (I6.6P . 



(6.7) 



□ 



References 

[1] Addario-Berry, L., Broutin, N., Devroye, L. and Lugosi, G. (2010) On combinatorial 
testing problems. Ann. Statist. 38 (5), 3063-3092. 

[2] Arias-Castro, E., Candes, E.J. and Durand, A. (2011) Detection of an anomalous 
clusters in a network. Ann. Statist. 39 (1), 278-304. 

[3] Arias-Castro, E., Candes, E. J. and Plan, Y. (2010) Global Testing and Sparse Alter- 
natives: ANOVA, Multiple Comparisons and the Higher Criticism. arXiv: 1007. 1434. 

[4] Bickel, P.J., Ritov, Y. and Tsybakov, A.B. (2009) Simultaneous analysis of Lasso and 
Dantzig selector. Annals of Statistics, 37, 4, 1705-1732. 

[5] Donoho, D. and Jin, J. (2004) Higher criticism for detecting sparse heterogeneous 
mixtures. Ann. Statist. 32, 962-994. 

[6] Ingster, Yu.I. (1997) Some problems of hypothesis testing leading to infinitely divisible 
distributions. Math. Methods of Stat. 6, 47-69. 

[7] Ingster, Yu.I. and Suslina, LA. (2002) On a detection of a signal of known shape in 
multichannel system. Zapiski Nauchn. Sem. POMI 294, 88-112 (in Russian, Transl. 
J. Math. Sci. 127, 1723-1736). 

[8] Koltchinskii,V., Lounici,K. and Tsybakov, A.B. (2011) Nuclear norm penalization and 
optimal rates for noisy low rank matrix completion. Annals of Statistics, to appear. 



33 



[9] Sun, X. and Nobel, A.B. (2010) On the maximal size of Large-Average and ANOVA- 
fit Submatrices in a Gaussian Random Matrix. ArXiv: 1009.0562vl 

[10] Shabalin, A.A., Weigman, V.J., Perou, CM. and Nobel, A.B. (2009). Finding Large 
Average Submatrices in High Dimensional Data. Annals of Applied Statistics 3, 985- 
1012. 



34 



